Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers
نویسندگان
چکیده
We present new communication-efficient parallel dense linear solvers: a solver for triangular linear systems with multiple right-hand sides and an LU factorization algorithm. These solvers are highly parallel and they perform a factor of 0.4P1/6 less communication than existing algorithms, where P is number of processors. The new solvers reduce communication at the expense of using more temporary storage. Previously, algorithms that reduce communication by using more memory were only known for matrix multiplication. Our algorithms are recursive, elegant, and relatively simple to implement. We have implemented them using MPI, a message-passing libray, and tested them on a cluster of workstations.
منابع مشابه
Efficient Parallel Solvers for Large Dense Systems of Linear Interval Equations
Verified solvers for dense linear (interval-)systems require a lot of resources, both in terms of computing power and memory usage. Computing a verified solution of large dense linear systems (dimension n > 10000) on a single machine quickly approaches the limits of today’s hardware. Therefore, an efficient parallel verified solver for distributed memory systems is needed. In this work we prese...
متن کاملHigh Performance Computing Benchmark Tool for Parallel Processing of Large Models
Benchmarks for parallel processing of large models is an urgent need for High Performance Computing (HPC) as today’s model size reaches millions of degrees of freedom. Explicit solvers as in the case of crash dynamics or fluid dynamics do not require matrix based equation solvers and inherently exhibit good scalability on large numbers of processors. Where as analysis requiring implicit solvers...
متن کاملParallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems
We design efficient and distributed-memory parallel randomized direct solvers for large structured dense linear systems, including a fully matrix-free version based on matrix-vector multiplications and a partially matrix-free one. The dense coefficient matrix A has an off-diagonal low-rank structure, as often encountered in practical applications such as Toeplitz systems and discretized integra...
متن کاملFast (Parallel) Dense Linear System Solvers in C-XSC Using Error Free Transformations and BLAS
Existing selfverifying solvers for dense linear (interval-)systems in C-XSC provide high accuracy, but are rather slow. A new set of solvers is presented, which are a lot faster than the existing solvers, without losing too much accuracy. This is achieved through two main changes. First, an alternative method for the computation of exact dot products based on the DotK-Algorithm is implemented. ...
متن کاملA Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver
In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Processing Letters
دوره 12 شماره
صفحات -
تاریخ انتشار 2002